Intelligent Recommendation Systems Powered by Consensus Neural Networks: The Ultimate Solution for Finding Suitable Chiral Chromatographic Systems?

The selection of suitable combinations of chiral stationary phases (CSPs) and mobile phases (MPs) for the enantioresolution of chiral compounds is a complex issue that often requires considerable experimental effort and can lead to significant waste. Linking the structure of a chiral compound to a CSP/MP system suitable for its enantioseparation can be an effective solution to this problem. In this study, we evaluate algorithmic tools for this purpose. Our proposed consensus model, which uses multiple optimized artificial neural networks (ANNs), shows potential as an intelligent recommendation system (IRS) for ranking chromatographic systems suitable for the enantioresolution of chiral compounds with different molecular structures. To evaluate the IRS potential in a proof-of-concept stage, 56 structural descriptors for 56 structurally unrelated chiral compounds across 14 different families are considered. Chromatographic systems under study comprise 7 cellulose and amylose derivative CSPs and acetonitrile or methanol aqueous MPs (14 chromatographic systems in all). The ANNs are optimized using a fit-for-purpose version of the chaotic neural network algorithm with competitive learning (CCLNNA), a novel approach not previously applied in the chemical domain. CCLNNA is adapted to define the inner ANN complexity and perform feature selection of the structural descriptors. A customized target function evaluates the correctness of recommending the appropriate CSP/MP system. The ANN-consensus model exhibits no advisory failures and requires only an experimental attempt to verify the IRS recommendation for complete enantioresolution. This outstanding performance highlights its potential to effectively resolve this problem.


■ INTRODUCTION
The study of the implications of chirality for life is an area of active research and debate due to the large number of chiral molecules that are part of living organisms and our daily lives.To conduct these studies, analytical methods for separating enantiomers of chiral molecules are paramount.
The use of chiral stationary phases (CSPs) in different chromatographic techniques is undoubtedly a good option for enantiomer separation.Among the wide variety of commercially available CSPs, those based on amylose and cellulose polysaccharide derivatives are the preferred choice for the enantioseparation of chiral compounds. 1These CSPs, which are widely used in both supercritical fluid chromatography (SFC) and high-performance liquid chromatography (HPLC), enable the separation of the enantiomers of a variety of compounds. 1SFC, with its higher efficiency, reduced environmental impact, faster separations, and ease of mobile phase removal, is a competitive choice for preparative-scale enantiomer separations.However, SFC has fewer alternative separation mechanisms compared to HPLC, and its mobile phase selection is relatively limited, offering fewer oppor-tunities to manipulate separation selectivity based on its composition. 1In HPLC, separations can be carried out in the normal phase (NPLC), reversed-phase (RPLC), hydrophilic interaction liquid chromatography (HILIC), and polar organic modes.At the analytical scale, RPLC and HILIC have advantages in the analysis of aqueous samples (e.g., biological, pharmaceutical, and environmental) and in coupling with mass spectrometry detection. 2−5 Despite the advances, finding the right CSP/mobile phase (MP) combination for the separation of a pair of enantiomers remains a challenge; 6−8 in fact, chiral separations are considered one of the most challenging of all analytical separations.The most common strategy is to test a series of CSP/MP combinations, a trial-and-error approach that often requires considerable experimental and economic effort that can lead to significant waste, 6−9 which contradicts green chemistry principles and sustainable development goals (SDGs).Alternatively, quantitative structure-enantioselective retention relationships (QSERR) have emerged as a useful sustainable strategy to select a suitable CSP/MP combination in chiral HPLC optimization processes. 10−18 In recent years, artificial intelligence (AI) has experienced significant growth, leading to significant advances in various fields of science and technology. 19,20In analytical chemistry, AI has been applied to optimize and interpret data in various analytical techniques including "omics" analysis, biosensors, or microfluidics. 21In chromatography, AI has facilitated the identification and quantification of compounds 21,22 and method optimization. 23,24mong AI methodologies, artificial neural networks (ANNs) are a flexible machine learning approach capable of modeling complex/nonlinear relationships between input and response variables.ANNs were designed to learn from a training data set using a neural architecture that consists of interconnected artificial neurons (calculation units) arranged in several layers.However, due to the inner complexity of ANN, it is difficult to extract information on how models work to predict the output from input variables.−34 Our research group, for the first time, used ANN to quantitatively estimate the enantioresolution (R s ) of a heterogeneous set of chiral molecules. 32−35 These approaches include rulebased expert system, 35 random forest, 15 and neural networks 33,34 and rely on large databases (e.g., ChirBase, data taken from the literature) to predict enantioselectivity 15,34 and retention times. 31In these studies, the CSP selection is based on the separation probability 33 or on the CSPs' enantioselectivity. 15,34Although these models have demonstrated some degree of success, they exhibit some limitations, mainly related to the quality of the data.Additionally, in some cases, the effects of the mobile phase are not considered, 15,34 so predictions and recommendations are mainly limited to CSPs.Conversely, none of the suggested models were purpose-built to recommend chromatographic systems using R s data.
In a scientific-mathematical problem, where a response depends on several variables, optimization aims to identify which are relevant (process of feature selection), or their appropriate values, to maximize or minimize an objective function related to the "goodness" of the response. 36Among different optimization strategies, metaheuristic optimization algorithms have shown good performance in optimizing complex and nonlinear problems. 36Neural network algorithm (NNA) is a new metaheuristic algorithm inspired by ANNs with high global search ability. 37However, this algorithm exhibits slow and premature (local) convergence when applied to complex problems.Recently, Zhang reported an improved NNA, named CCLNNA (chaotic neural network algorithm with competitive learning), to overcome these drawbacks. 38CLNNA divides the population of solutions (vectors of parameters to optimize) into excellent and common subpopulations to improve the global search capability and incorporates learning strategies such as the average position of the current population and the chaotic map, to avoid premature convergence.The modifications introduced in NNA significantly improve the optimization performance, making CCLNNA a powerful algorithm for solving complex multimodal optimization problems compared to other competing algorithms. 38his study presents a novel AI-based approach designed to provide a cost-effective and sustainable solution as an alternative to traditional trial-and-error methods commonly used in the selection of chromatographic systems for chiral analysis.The main objective is to evaluate the effectiveness of a consensus model composed of multiple ANNs as an intelligent recommendation system (IRS) for chiral chromatography systems.The IRS provides CSP/MP recommendations for the enantioseparation of chiral compounds based on a reduced set of their structural descriptors.The specific objectives to achieve this central goal are (i) To optimize the CSP/MP recommendation correctness of independent ANNs, through an adapted CCLNNA algorithm.(ii) To collect those with the best recommending capability for appropriate chromatographic systems.(iii) To develop an ANN-consensus model that prioritizes chromatographic systems to be assayed in the laboratory.(iv) To assess the potential of the recommendation model and its effectiveness.The hypothesis is that a satisfactory model potential, at the current proof-of-concept stage, could guide cost-benefit decisions regarding incorporating sufficient additional data for a final robust model.As far as we know, the CCLNNA algorithm has not been used before for feature selection or to optimize an ANN.In addition, an IRS specifically designed and optimized to directly recommend chromatographic systems has not been reported previously.
Separations were performed at a flow rate of 1 mL•min −1 and a temperature of 25 °C.Most of the data were taken from previous papers, 18,32,39 except those corresponding to CSPs A1 and A3 with MPs containing more than 80% acetonitrile and all methanol MPs, which were obtained experimentally in this work.All experiments for compounds N = 29, 34, 37, 51, and 56 with CSPs A1 and A3; and for compounds N = 4 and 26 with C2, C3, C4, and C5 cellulose, CSPs were also performed in this work, as well as experiments for omeprazole and tebuconazole (external test compounds).For each compound and CSP, among the different experimental R s values obtained with the several MPs tested, the maximum R s value was considered in this work.The experimental maximum R s values were categorized using the binary codes as 0 (R s < 1.5; null or incomplete enantioresolution) and 1 (R s ≥ 1.5; complete enantioresolution) and constitute the response matrix (T; see Table S1).
The descriptor matrix (X) consisted of 56 structural descriptors (see Table S2).These descriptors include 7 chiral carbon-related parameters (x 1 −x 7 ) obtained as the number of atoms/groups bonded to the chiral atom (C*); e.g., C*a (x 4 ) corresponds to the number of aliphatic groups directly bonded to the chiral atom.The rest (molecular descriptors and topological and hydrophobicity parameters) were estimated descriptors from MarvinSketch (©ChemAxon Ltd., version 21.8.0) and ChemSketch (©Advanced Chemistry Development, Inc., version 2020.2.0) software.Examples of these parameters are the molar mass (x 8 ), the number of aromatic groups with 1,4 substitutions (x 49 ), the number of amino tertiary groups in aliphatic cycles (x 53 ), among others.Xvariables were autoscaled as a pretreatment in this work.
ANN Nomenclature and Strategies.In this study, ANN architecture design focused on regression.The input layer consisted of a data matrix of known structural descriptors (X) for the set of compounds (Table S2), while matrices of known T values (Table S1; training stage) or predicted R s response values (Yc; prediction stage) were positioned in the output layer.We tested ANNs with a maximum of 2 hidden layers, with k neurons in the first hidden layer, between 1 and 30, and kk neurons in the second hidden layer, between 0 and 30 (0 means only one hidden layer).As in any regression model, ANN predictions (Yc) initially appeared as continuous data but were later converted into discrete values (Y; 0 or 1) according to the criteria depicted in Table S3.Comparing the Y values with T allowed us to determine whether the ANN recommendation was correct or not.
To preserve the ANN representativeness, given the limited number of available compounds, we decided to use just 6 compounds for validation (to mitigate ANN overfitting) and 2 test compounds for prediction.The remaining 48 compounds were reserved for training the ANN.The adapted CCLNNA algorithm automatically selected the validation and test compounds.In addition, as part of the optimization of the ANN, CCLNNA also determined the internal network complexity (k, kk) and included a feature selection process to reduce the number of descriptors (ND) for each ANN.
Software and Calculations.MATLAB R2022b (Mathworks) was used for adapting/programming the algorithms, in conjunction with its Deep Learning Toolbox library that contains algorithms for creating and training ANNs, as well as the original CCLNNA version. 38CCLNNA operates on solution vectors, initially defined as continuous parameters to be optimized.In this case, our adapted CCLNNA version includes a necessary modification of the original CCLNNA algorithm to convert the continuous values of each solution vector into integer values.As solution vector, the following 66 indexes were used: 8 indexes corresponding to the selected validation and test compounds (N values in Table S1), 2 indexes corresponding to the inner ANN architecture (k and kk between the above indicated limits), and 56 indexes corresponding to the descriptors, whose values in the 0−1 range were converted to 0 or 1 (threshold value = 0.8) indicating the presence or absence, respectively, of any descriptor into the model.
An additional modification to the CCLNNA algorithm includes maximizing a target function, named PScoreo (penalized score considering overfitting), to achieve the objectives of this study.The PScoreo for the ANN model is a mean value calculated from the individual contributions of each compound i (PScoreo i ) as follows: The following equation is proposed for calculating PScoreo i where "Compound role" has a value equal to 1, 2, or 3 depending on whether compound i is assigned to the training, validation, and test subset, respectively.The inclusion of the term "Compound role" in PScoreo i aimed to mitigate overfitting by penalizing errors concentrated on test and validation compounds."Success" and "Attempt" values were computed based on the criteria outlined in Table S3.For a given compound i, the optimized ANN provided a Yc output associated with a given CSP/MP system.We assigned a coded Y value of 1 (if Yc ≥ 0.5) or 0 (if Yc ≤ 0.25).For intermediate Yc values (0.25 < Yc < 0.5), both Y values (0 or 1) were considered.A "Success" value was assigned based on the agreement between the Y and T values."Attempt" refers to the number of attempts required to achieve a Success >0.For each compound i, up to three CSP/MP system recommendations (R 1 , R 2 , and R 3 ) were considered.These corresponded to the three highest Yc values predicted by the ANN in descending order.
Note that the assignment of "Success" and "Attempt" values is based on a personal assessment, similar to how an analyst in the laboratory would evaluate the usefulness of a given recommendation.The rules in Table S3 were used solely for optimizing each ANN.Other customized equations and rules could be considered, but such exploration is beyond the scope of this proof-of-concept stage.
Two main CCLNNA parameters were configured: the maximum number of iterations (MaxIt) varied from 150 to 300, and the solution population size (nPop) ranged from 50 to 150.

■ RESULTS AND DISCUSSION
Intelligent Recommendation System (IRS).An IRS specifically designed to assist in CSP/MP system selection can replace traditional trial-and-error methods, human recommenders (if they exist), and models focused only on predictive or classifying abilities.The IRS would learn from molecular descriptors and enantioresolution data to provide hierarchical CSP/MP recommendations (R 1 > R 2 > R 3 ) or to signal unfeasible enantioresolution.The IRS's strength lies in optimizing its recommendation capability.This concept could spur further research for a broader CSP/MP-IRS.Neural network arrangements (CCLNNA, ANNs) seem to be suitable for the expected complexity.See the Table S4 for further exploration of the topic.
ANNs Optimization.The optimization of ANNs (to maximize the PScoreo target function) was conducted using the CCLNNA algorithm, as outlined in the Experimental Section.A total of 30 CCLNNA-ANN processes were executed.We arranged the outcomes of the generated ANNs by descending the PScoreo values.For instance, Table 1 presents the top 11 ANNs ranked by their PScoreo.
Table S1 shows that 44 of the compounds studied are completely enantioresolved using one or more CSP/MP systems, while 12 compounds cannot be enantioresolved with any of the systems tested.This implies that an ideal IRS would perform 44 "attempts", which means that the first recommendation (R 1 ) would be sufficient to achieve complete enantioresolution on the first "attempt", as nonenantioresolved compounds do not require an "attempt" (see Table S3).Thus, for each ANN, it is possible to calculate the difference between the total attempts required to achieve complete enantioresolution and this minimum value (44).Table 1 also shows the number of "extra attempts" for the top 11 ANNs.
The main discovery from the examination of the 30 ANNs was the small number of failures in ANN recommendations (only 5 ANNs failed to provide a correct recommendation for one or two compounds within the 3 allowed attempts).This suggests the high effectiveness of the CCLNNA-ANN combination.It was noted that ANNs with a PScoreo value >112 consistently provide correct recommendations for all compounds.On the other hand, the "extra attempt" values exhibited significant variability, ranging from 2 to 20 for the 30 ANNs.Additional insights were as follows: (i) An inverse correlation between the PScoreo and the "extra attempts" values, as it could be expected.Such a relationship became more evident as PScoreo raised.(ii) PScoreo did not show clear relationships with nPop, ANN internal complexity (k + kk), or ND.(iii) A modest positive correlation between PScoreo and MaxIt, although some satisfactory ANNs were achieved with only 150 iterations.
Compared with the 30 ANNs, the 11 ANNs in Table 1 exhibited PScoreo > 115, suggesting a low probability of failure for new compounds, and the lowest "extra attempt" values (≤5), suggesting low experimental effort in a future application.However, they present differences in their complexity.The simplest ANNs with one hidden layer (kk = 0; in bold in Table 1) and a small number of descriptors (ND ≤ 16 in bold in Table 1) are expected to be more robust.On the other hand, the CCLNNA automatically selected different validation and test object subsets for each ANN (the most frequent are also bolded in Table 1).
ANNs Consensus Model as IRS.Despite the impressive performance and reasonable number of "extra attempts" by the top-performing individual ANNs (referenced in Table 1), we opted for an ANN-consensus model to establish a more reliable IRS.This model combines the recommendations made by multiple ANNs to provide the predominant one as a single recommendation.The rationale behind this approach is to reinforce the reliability of the recommendations while trying to minimize the number of "extra attempts".From the available options, we chose to evaluate an ANN-consensus model comprising the five ANNs from Table 1 with the lowest number of descriptors (ND ≤ 16) and "extra attempts" ≤ 4.These five ANNs are highlighted in Table 1 (ANN1−ANN5).The joint decisions of these five ANNs would consensually determine the final CSP/MP system recommendation (IRS recommendation).The simplest option was to look at the first recommendation (R 1 ) from each ANN and select the most frequent of the five R 1 as the IRS recommendation.
Table 2 shows the CSP/MP systems corresponding to R 1 provided by each of the five ANNs comprising the ANNconsensus model for those compounds having a most frequent R 1 value (i.e., IRS recommendation).Table 2 also shows the estimated (i.e., Y estimation) and experimental (i.e., T; Table S1) categorical enantioresolution values corresponding to IRS recommendations.CSP/MP system = 0 is used for those compounds with Y = 0 values.
The comparison of Y and T data allows us to check the performance of the ANN-consensus model.For instance, for compound N = 1, three ANNs recommend the CSP/MP system 12 (Amylose3/acetonitrile, A3a), while the remaining two suggest system 11 (Amylose1/acetonitrile, A1a).The IRS recommendation would be system 12 (i.e., the IRS predicts complete enantioresolution with this chromatographic system, Y = 1), consistent with the experimental observation in Table S1.Note that system 11 would also produce the correct output.In a future application of the model for selecting the appropriate chromatographic system, the analyst would need only one confirmatory laboratory test (the minimum necessary) to verify the IRS recommendation.In this case, "extra attempts" would be 0.
Compound N = 2, for which the model suggests no enantioresolution in any of the CSP/MP systems (R 1 = 0 for all of the ANNs), represents the opposite case.The prediction (Y = 0) agrees with the experimental observation (T = 0 in all CSP/MP systems, see Table S1).In this case, no confirmatory tests are necessary (and would not be carried out in practice); thus, "extra attempts" would be 0.
Given that the IRS recommendation for all compounds in Table 2 is correct (Y = T; in the first attempt), the number of "extra attempts" required is zero, demonstrating the outstanding performance of the ANN-consensus model.
However, Table 2     present decision-making ambiguities due to the lack of a dominant R 1 recommendation (see Table 3).This can be particularly problematic for cases such as compounds N = 12, 16, and 31, where the five ANN R 1 recommendations differ.
To address these challenges, we devised a different approach for the ANN-consensus model, incorporating the second recommendation (R 2 ) from each ANN.Table 3 shows the results of the R 1 and R 2 recommendations for compounds not included in Table 2. Additionally, it includes a new parameter determining the reliability of the IRS recommendation (on a scale from 0 to 1), using the following equation that weights R 1 and R 2 recommendations = + reliability (frequency R ) 0.5(frequency R ) The most reliable recommendation in Table 3 was selected as the IRS recommendation.Again, we found excellent results in terms of IRS recommendation correctness (there is an agreement between Y and T values in all cases) and experimental effort minimization (zero "extra attempts").For compound N = 12, two IRS recommendations (CSP/MP = 11, 12) with the same reliability were obtained; anyway, both are correct and would imply zero "extra attempts".
Such approaches and equations can significantly aid the decision-making process and may be customized if convenient by the analyst.For instance, the first three recommendations could be incorporated (e.g., in cases such as compound N = 12) to try to increase the reliability and to eliminate the human factor from the decision-making process.
An additional benefit of the ANN-consensus model is that it can assess which structural descriptors are most frequently involved in the assessment, making the consensus model a sort of indirect method for establishing the relative importance of descriptors (see the Supporting Information; Figure S1).
Application Examples of the IRS in a Proof-of-Concept Context.The current IRS was utilized on two external test compounds (omeprazole and tebuconazole; unknown T), belonging to families studied, just with the goal of further reinforcing its potential.The recommended IRS CSP/MP systems for these compounds were later experimentally confirmed (i.e., Y = T; Figure 1), underscoring its potential and advocating for continued research to move beyond the proof-of-concept stage.

■ CONCLUSIONS
We propose new algorithmic tools capable of recommending chiral chromatography systems for the enantioresolution of a heterogeneous set of compounds from selected structural molecular descriptors.For the first time (to the best of our knowledge), a CCLNNA optimization (adapted) algorithm has been combined with ANN; thus, two different neural network approaches are merged (CCLNNA-ANN).The approach developed includes autonomous CCLNNA strategy, guided by the own ANN outputs, involving: (i) compounds subsets selection, (ii) ANN architecture optimization, and (iii) feature selection on descriptors.The overall strategy has been designed for comparing different ANNs (according to a new fit-for-purpose customizable PScoreo equation), based on the efficiency on correctness (recommended CSP/MP system) and minimal experimental verification effort.
The following conclusions can be drawn from the results obtained in this work, limited to its aim: (i) The optimization of fit-for-purpose ANNs (directed to rank the chromatographic systems for complete enantioresolution of a given neutral or basic chiral compound based on its structure) by means of the adapted CCLNNA algorithm has the potential to be a practical and effective bet.(ii) The CCLNNA-ANN approach provides excellent results with a very low number of assessment failures.(iii) A limited small number of experimental tests beyond the planned ones (extra attempts) to verify the recommendations allows filtering out the most suitable ANNs.(iv) Best ANNs can form a consensus model, contributing to increase the potential of the proposed strategy.The first recommendation (R 1 ) of each ANN forming the IRS recommendation provides the right CSP/MP system for most of the compounds studied.In the case of ambiguity, the second recommendation (R 2 ) of each ANN facilitates the decision-making process.For the case under study, the ANN-consensus model has an outstanding performance since there is a full agreement between the IRS recommendation and the experimental results.(v) The IRS (combining CCLNNA-ANN and ANN-consensus model strategies), in the framework of artificial intelligence, has proven enough potential to provide a simple solution to a highly complex problem (a tool to be used by analysts interested in the enantioseparation a given chiral compound).Thus, the collection of more experimental data (more compounds from more families, more chiral stationary phases, and mobile phases, maybe more descriptors) is encouraged to derive a single ANN or an ANN-consensus model able to recommend the suitable chiral chromatographic system to any future compound, accompanied by an improved probability of correctness.
Compounds, family, and binary codes for categorical enantioresolution (matrix T) for the 14 CSP/MP systems studied; composition of the mobile phases; structural descriptors used for modeling; customized criteria applied during CCLNNA optimization; exploration of the topic intelligent recommendation system (IRS); and an approximation to the relative importance of the molecular descriptors.(PDF) ■

Figure 1 .
Figure 1.Application examples of the IRS for the external test compounds omeprazole and tebuconazole and experimental results.

Table 1 .
Main Features of the Top 11 ANNs with the Highest PScoreo Values a See further details in the Experimental Section.b ANNs selected to evaluate a consensus model as IRS. a

Table 2 .
includes 84% of the compounds.The rest of the compounds (N = 8, 12, 15, 16, 23, 31, 51, 53, 55) CSP/MP Systems Corresponding to the R 1 Recommendation for the Five Selected ANNs and the IRS Recommendation (the Most Frequent R 1 ) Together with Their Corresponding Categorical (Y) and Experimental (T) Enantioresolution a See the Experimental Section to identify the CSP/MP systems.b Compounds not listed have an undefined consensus outcome. a

Table 3 .
CSP/MP Systems Corresponding to the R 1 and R 2 Recommendations for the Five Selected ANNs and the IRS Recommendation Together with Their Corresponding Reliability, Categorical (Y) and Experimental (T) Enantioresolution a See the Experimental Section to identify the CSP/MP systems. a